{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 使用 tiktoken 计算 token 数量\n",
    "\n",
    "[`tiktoken`](https://github.com/openai/tiktoken/blob/main/README.md)是OpenAI开发的一种BPE分词器。\n",
    "\n",
    "给定一段文本字符串（例如，`\"tiktoken is great!\"`）和一种编码方式（例如，`\"cl100k_base\"`），分词器可以将文本字符串切分成一系列的token（例如，`[\"t\", \"ik\", \"token\", \" is\", \" great\", \"!\"]`）。\n",
    "\n",
    "将文本字符串切分成token非常有用，因为GPT模型看到的文本就是以token的形式呈现的。知道一段文本字符串中有多少个token可以告诉你（a）这个字符串是否对于文本模型来说太长了而无法处理，以及（b）一个OpenAI API调用的费用是多少（因为使用量是按照token计价的）。\n",
    "\n",
    "## 编码方式\n",
    "\n",
    "编码方式规定了如何将文本转换成token。不同的模型使用不同的编码方式。\n",
    "\n",
    "`tiktoken`支持OpenAI模型使用的三种编码方式：\n",
    "\n",
    "| 编码名称           | OpenAI模型                                       |\n",
    "|-------------------------|-----------------------------------------------------|\n",
    "| `cl100k_base`           | `gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002`  |\n",
    "| `p50k_base`             | Codex模型, `text-davinci-002`, `text-davinci-003`|\n",
    "| `r50k_base` (或 `gpt2`) | 像 `davinci` 这样的GPT-3模型                         |\n",
    "\n",
    "你可以使用 `tiktoken.encoding_for_model()` 获取一个模型的编码方式，如下所示：\n",
    "```python\n",
    "encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')\n",
    "```\n",
    "\n",
    "注意，`p50k_base` 与 `r50k_base` 有很大的重叠，对于非代码应用，它们通常会产生相同的token。\n",
    "\n",
    "## 不同语言的分词器库\n",
    "\n",
    "对于 `cl100k_base` 和 `p50k_base` 编码方式：\n",
    "\n",
    "- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md)\n",
    "- .NET / C#: [SharpToken](https://github.com/dmitry-brazhenko/SharpToken), [TiktokenSharp](https://github.com/aiqinxuancai/TiktokenSharp)\n",
    "- Java: [jtokkit](https://github.com/knuddelsgmbh/jtokkit)\n",
    "\n",
    "对于 `r50k_base` (`gpt2`) 编码方式，许多语言都提供了分词器。\n",
    "\n",
    "- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md) (或者另选 [GPT2TokenizerFast](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast))\n",
    "- JavaScript: [gpt-3-encoder](https://www.npmjs.com/package/gpt-3-encoder)\n",
    "- .NET / C#: [GPT Tokenizer](https://github.com/dluc/openai-tools)\n",
    "- Java: [gpt2-tokenizer-java](https://github.com/hyunwoongko/gpt2-tokenizer-java)\n",
    "- PHP: [GPT-3-Encoder-PHP](https://github.com/CodeRevolutionPlugins/GPT-3-Encoder-PHP)\n",
    "\n",
    "（OpenAI对第三方库不做任何背书或保证。）\n",
    "\n",
    "## 如何进行通常的分词操作\n",
    "\n",
    "在英语中，token的长度通常在一个字符到一个单词之间变化（例如，`\"t\"` 或 `\" great\"`），尽管在某些语言中，token可以比一个字符短或比一个单词长。空格通常与单词的开头一起分组（例如，`\" is\"` 而不是 `\"is \"` 或 `\" \"`+`\"is\"`）。你可以快速在 [OpenAI分词器](https://beta.openai.com/tokenizer) 检查一段字符串如何被分词。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. 安装 `tiktoken`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: tiktoken in /root/miniconda3/envs/langchain/lib/python3.10/site-packages (0.4.0)\n",
      "Requirement already satisfied: regex>=2022.1.18 in /root/miniconda3/envs/langchain/lib/python3.10/site-packages (from tiktoken) (2023.5.5)\n",
      "Requirement already satisfied: requests>=2.26.0 in /root/miniconda3/envs/langchain/lib/python3.10/site-packages (from tiktoken) (2.31.0)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /root/miniconda3/envs/langchain/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken) (1.26.16)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /root/miniconda3/envs/langchain/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken) (3.4)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /root/miniconda3/envs/langchain/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken) (3.1.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /root/miniconda3/envs/langchain/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken) (2023.5.7)\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!pip install --upgrade tiktoken"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Import `tiktoken`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tiktoken"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load an encoding\n",
    "\n",
    "使用`tiktoken.get_encoding()`按名称加载编码。\n",
    "\n",
    "第一次运行时，它将需要互联网连接进行下载。后续运行不需要互联网连接。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "encoding = tiktoken.get_encoding(\"cl100k_base\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用`tiktoken.encoding_for_model()`函数可以自动加载给定模型名称的正确编码。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "encoding = tiktoken.encoding_for_model(\"gpt-3.5-turbo\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Turn text into tokens with `encoding.encode()`\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `.encode()` method converts a text string into a list of token integers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[83, 1609, 5963, 374, 2294, 0]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "encoding.encode(\"tiktoken is great!\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过计算`.encode()`返回的列表的长度来统计token数量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def num_tokens_from_string(string: str, encoding_name: str) -> int:\n",
    "    \"\"\"返回文本字符串中的Token数量\"\"\"\n",
    "    encoding = tiktoken.get_encoding(encoding_name)\n",
    "    num_tokens = len(encoding.encode(string))\n",
    "    return num_tokens\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "num_tokens_from_string(\"tiktoken is great!\", \"cl100k_base\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Turn tokens into text with `encoding.decode()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`.decode()`将一个token整数列表转换为字符串。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'tiktoken is great!'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "encoding.decode([83, 1609, 5963, 374, 2294, 0])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'!\"#$'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "encoding.decode([0,1,2,3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注意：尽管`.decode()`可以应用于单个token，但对于不在 utf-8 边界上的token来说，解码可能会有损失或错误。**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于单个token，`.decode_single_token_bytes()` 安全地将单个整数token转换为其表示的字节。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[b't', b'ik', b'token', b' is', b' great', b'!']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[encoding.decode_single_token_bytes(token) for token in [83, 1609, 5963, 374, 2294, 0]]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（在字符串前面的`b`表示这些字符串是字节字符串。）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Comparing encodings\n",
    "\n",
    "不同的编码方式在分割单词、处理空格和非英文字符方面存在差异。通过上述方法，我们可以比较几个示例字符串在不同的编码方式下的表现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compare_encodings(example_string: str) -> None:\n",
    "    \"\"\"Prints a comparison of three string encodings.\"\"\"\n",
    "    # print the example string\n",
    "    print(f'\\nExample string: \"{example_string}\"')\n",
    "    # for each encoding, print the # of tokens, the token integers, and the token bytes\n",
    "    for encoding_name in [\"gpt2\", \"p50k_base\", \"cl100k_base\"]:\n",
    "        encoding = tiktoken.get_encoding(encoding_name)\n",
    "        token_integers = encoding.encode(example_string)\n",
    "        num_tokens = len(token_integers)\n",
    "        token_bytes = [encoding.decode_single_token_bytes(token) for token in token_integers]\n",
    "        print()\n",
    "        print(f\"{encoding_name}: {num_tokens} tokens\")\n",
    "        print(f\"token integers: {token_integers}\")\n",
    "        print(f\"token bytes: {token_bytes}\")\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Example string: \"antidisestablishmentarianism\"\n",
      "\n",
      "gpt2: 5 tokens\n",
      "token integers: [415, 29207, 44390, 3699, 1042]\n",
      "token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']\n",
      "\n",
      "p50k_base: 5 tokens\n",
      "token integers: [415, 29207, 44390, 3699, 1042]\n",
      "token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']\n",
      "\n",
      "cl100k_base: 6 tokens\n",
      "token integers: [519, 85342, 34500, 479, 8997, 2191]\n",
      "token bytes: [b'ant', b'idis', b'establish', b'ment', b'arian', b'ism']\n"
     ]
    }
   ],
   "source": [
    "compare_encodings(\"antidisestablishmentarianism\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Example string: \"2 + 2 = 4\"\n",
      "\n",
      "gpt2: 5 tokens\n",
      "token integers: [17, 1343, 362, 796, 604]\n",
      "token bytes: [b'2', b' +', b' 2', b' =', b' 4']\n",
      "\n",
      "p50k_base: 5 tokens\n",
      "token integers: [17, 1343, 362, 796, 604]\n",
      "token bytes: [b'2', b' +', b' 2', b' =', b' 4']\n",
      "\n",
      "cl100k_base: 7 tokens\n",
      "token integers: [17, 489, 220, 17, 284, 220, 19]\n",
      "token bytes: [b'2', b' +', b' ', b'2', b' =', b' ', b'4']\n"
     ]
    }
   ],
   "source": [
    "compare_encodings(\"2 + 2 = 4\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Example string: \"お誕生日おめでとう\"\n",
      "\n",
      "gpt2: 14 tokens\n",
      "token integers: [2515, 232, 45739, 243, 37955, 33768, 98, 2515, 232, 1792, 223, 30640, 30201, 29557]\n",
      "token bytes: [b'\\xe3\\x81', b'\\x8a', b'\\xe8\\xaa', b'\\x95', b'\\xe7\\x94\\x9f', b'\\xe6\\x97', b'\\xa5', b'\\xe3\\x81', b'\\x8a', b'\\xe3\\x82', b'\\x81', b'\\xe3\\x81\\xa7', b'\\xe3\\x81\\xa8', b'\\xe3\\x81\\x86']\n",
      "\n",
      "p50k_base: 14 tokens\n",
      "token integers: [2515, 232, 45739, 243, 37955, 33768, 98, 2515, 232, 1792, 223, 30640, 30201, 29557]\n",
      "token bytes: [b'\\xe3\\x81', b'\\x8a', b'\\xe8\\xaa', b'\\x95', b'\\xe7\\x94\\x9f', b'\\xe6\\x97', b'\\xa5', b'\\xe3\\x81', b'\\x8a', b'\\xe3\\x82', b'\\x81', b'\\xe3\\x81\\xa7', b'\\xe3\\x81\\xa8', b'\\xe3\\x81\\x86']\n",
      "\n",
      "cl100k_base: 9 tokens\n",
      "token integers: [33334, 45918, 243, 21990, 9080, 33334, 62004, 16556, 78699]\n",
      "token bytes: [b'\\xe3\\x81\\x8a', b'\\xe8\\xaa', b'\\x95', b'\\xe7\\x94\\x9f', b'\\xe6\\x97\\xa5', b'\\xe3\\x81\\x8a', b'\\xe3\\x82\\x81', b'\\xe3\\x81\\xa7', b'\\xe3\\x81\\xa8\\xe3\\x81\\x86']\n"
     ]
    }
   ],
   "source": [
    "compare_encodings(\"お誕生日おめでとう\")\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Counting tokens for chat completions API calls\n",
    "\n",
    "ChatGPT模型，如gpt-3.5-turbo和gpt-4，与旧的完成模型一样使用token，但由于其基于消息的格式，很难准确计算对话中将使用多少个token。\n",
    "\n",
    "下面是一个示例函数，用于计算传递给gpt-3.5-turbo或gpt-4的消息中的token数量。\n",
    "\n",
    "请注意，从消息中计算token的确切方式可能因模型而异。请将下面函数中的计数视为估计值，并非永恒保证。\n",
    "\n",
    "特别地，在使用可选功能输入(input)的请求上方会消耗额外的token。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定义函数 num_tokens_from_messages，该函数返回由一组消息所使用的token数。\n",
    "def num_tokens_from_messages(messages, model=\"gpt-3.5-turbo\"):\n",
    "    \"\"\"Return the number of tokens used by a list of messages.\"\"\"\n",
    "    # 尝试获取模型的编码\n",
    "    try:\n",
    "        encoding = tiktoken.encoding_for_model(model)\n",
    "    except KeyError:\n",
    "        # 如果模型没有找到，使用 cl100k_base 编码并给出警告\n",
    "        print(\"Warning: model not found. Using cl100k_base encoding.\")\n",
    "        encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
    "    # 针对不同的模型设置token数量\n",
    "    if model in {\n",
    "        \"gpt-3.5-turbo-0613\",\n",
    "        \"gpt-3.5-turbo-16k-0613\",\n",
    "        \"gpt-4-0314\",\n",
    "        \"gpt-4-32k-0314\",\n",
    "        \"gpt-4-0613\",\n",
    "        \"gpt-4-32k-0613\",\n",
    "        }:\n",
    "        tokens_per_message = 3\n",
    "        tokens_per_name = 1\n",
    "    elif model == \"gpt-3.5-turbo-0301\":\n",
    "        tokens_per_message = 4  # 每条消息遵循 {role/name}\\n{content}\\n 格式\n",
    "        tokens_per_name = -1  # 如果有名字，角色会被省略\n",
    "    elif \"gpt-3.5-turbo\" in model:\n",
    "        # 对于 gpt-3.5-turbo 模型可能会有更新，此处返回假设为 gpt-3.5-turbo-0613 的token数量，并给出警告\n",
    "        print(\"Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.\")\n",
    "        return num_tokens_from_messages(messages, model=\"gpt-3.5-turbo-0613\")\n",
    "    elif \"gpt-4\" in model:\n",
    "        # 对于 gpt-4 模型可能会有更新，此处返回假设为 gpt-4-0613 的token数量，并给出警告\n",
    "        print(\"Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.\")\n",
    "        return num_tokens_from_messages(messages, model=\"gpt-4-0613\")\n",
    "    elif model in {\n",
    "        \"davinci\",\n",
    "        \"curie\",\n",
    "        \"babbage\",\n",
    "        \"ada\"\n",
    "        }:\n",
    "        print(\"Warning: gpt-3 related model is used. Returning num tokens assuming gpt2.\")\n",
    "        encoding = tiktoken.get_encoding(\"gpt2\")\n",
    "        num_tokens = 0\n",
    "        # only calc the content\n",
    "        for message in messages:\n",
    "            for key, value in message.items():\n",
    "                if key == \"content\":\n",
    "                    num_tokens += len(encoding.encode(value))\n",
    "        return num_tokens\n",
    "    else:\n",
    "        # 对于没有实现的模型，抛出未实现错误\n",
    "        raise NotImplementedError(\n",
    "            f\"\"\"num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.\"\"\"\n",
    "        )\n",
    "    num_tokens = 0\n",
    "    # 计算每条消息的token数\n",
    "    for message in messages:\n",
    "        num_tokens += tokens_per_message\n",
    "        for key, value in message.items():\n",
    "            num_tokens += len(encoding.encode(value))\n",
    "            if key == \"name\":\n",
    "                num_tokens += tokens_per_name\n",
    "    num_tokens += 3  # 每条回复都以助手为首\n",
    "    return num_tokens\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gpt-3.5-turbo-0613\n",
      "129 prompt tokens counted by num_tokens_from_messages().\n",
      "129 prompt tokens counted by the OpenAI API.\n",
      "\n",
      "gpt-3.5-turbo\n",
      "Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.\n",
      "129 prompt tokens counted by num_tokens_from_messages().\n",
      "129 prompt tokens counted by the OpenAI API.\n",
      "\n",
      "gpt-4-0613\n",
      "129 prompt tokens counted by num_tokens_from_messages().\n",
      "129 prompt tokens counted by the OpenAI API.\n",
      "\n",
      "gpt-4\n",
      "Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.\n",
      "129 prompt tokens counted by num_tokens_from_messages().\n",
      "129 prompt tokens counted by the OpenAI API.\n",
      "\n",
      "davinci\n",
      "Warning: gpt-3 related model is used. Returning num tokens assuming gpt2.\n",
      "89 prompt tokens counted by num_tokens_from_messages().\n",
      "89 prompt tokens counted by the OpenAI API.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 让我们验证上面的函数是否与OpenAI API的响应匹配\n",
    "\n",
    "import openai\n",
    "\n",
    "example_messages = [\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"content\": \"You are a helpful, pattern-following assistant that translates corporate jargon into plain English.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"name\": \"example_user\",\n",
    "        \"content\": \"New synergies will help drive top-line growth.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"name\": \"example_assistant\",\n",
    "        \"content\": \"Things working well together will increase revenue.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"name\": \"example_user\",\n",
    "        \"content\": \"Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"name\": \"example_assistant\",\n",
    "        \"content\": \"Let's talk later when we're less busy about how to do better.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": \"This late pivot means we don't have time to boil the ocean for the client deliverable.\",\n",
    "    },\n",
    "]\n",
    "\n",
    "for model in [\n",
    "    \"gpt-3.5-turbo-0613\",\n",
    "    \"gpt-3.5-turbo\",\n",
    "    \"gpt-4-0613\",\n",
    "    \"gpt-4\",\n",
    "    ]:\n",
    "    print(model)\n",
    "    # example token count from the function defined above\n",
    "    print(f\"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().\")\n",
    "    # example token count from the OpenAI API\n",
    "    response = openai.ChatCompletion.create(\n",
    "        model=model,\n",
    "        messages=example_messages,\n",
    "        temperature=0,\n",
    "        max_tokens=1,  # we're only counting input tokens here, so let's not waste tokens on the output\n",
    "    )\n",
    "    print(f'{response[\"usage\"][\"prompt_tokens\"]} prompt tokens counted by the OpenAI API.')\n",
    "    print()\n",
    "\n",
    "for model in [\n",
    "    \"davinci\"\n",
    "    ]:\n",
    "    print(model)\n",
    "    # example token count from the function defined above\n",
    "    print(f\"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().\")\n",
    "    prompt_messages = \"\"\n",
    "    # only calc the content\n",
    "    for message in example_messages:\n",
    "        for key, value in message.items():\n",
    "            if key == \"content\":\n",
    "                prompt_messages += value\n",
    "    # example token count from the OpenAI API\n",
    "    response = openai.Completion.create(\n",
    "        model=model,\n",
    "        prompt=prompt_messages,\n",
    "        temperature=0,\n",
    "        max_tokens=1,  # we're only counting input tokens here, so let's not waste tokens on the output\n",
    "    )\n",
    "    print(f'{response[\"usage\"][\"prompt_tokens\"]} prompt tokens counted by the OpenAI API.')\n",
    "    print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  },
  "vscode": {
   "interpreter": {
    "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
